Closed form word embedding alignment
نویسندگان
چکیده
We develop a family of techniques to align word embeddings which are derived from different source datasets or created using mechanisms (e.g., GloVe word2vec). Our methods simple and have closed form optimally rotate, translate, scale minimize root mean squared errors maximize the average cosine similarity between two same vocabulary into dimensional space. extend approaches known as absolute orientation, popular for aligning objects in three dimensions, generalize an approach by Smith et al. (ICLR 2017). prove new results optimal scaling maximizing similarity. Then, we demonstrate how evaluate sources mechanisms, that certain properties like synonyms analogies preserved across can be enhanced simply averaging ensembles embeddings.
منابع مشابه
Consistent Alignment of Word Embedding Models
Word embedding models offer continuous vector representations that can capture rich contextual semantics based on their word co-occurrence patterns. While these word vectors can provide very effective features used in many NLP tasks such as clustering similar words and inferring learning relationships, many challenges and open research questions remain. In this paper, we propose a solution that...
متن کاملBayesian Neural Word Embedding
Recently, several works in the domain of natural language processing presented successful methods for word embedding. Among them, the Skip-Gram (SG) with negative sampling, known also as word2vec, advanced the stateof-the-art of various linguistics tasks. In this paper, we propose a scalable Bayesian neural word embedding algorithm that can be beneficial to general item similarity tasks as well...
متن کاملWord to word alignment strategies
Word alignment is a challenging task aiming at the identification of translational relations between words and multi-word units in parallel corpora. Many alignment strategies are based on links between single words. Different strategies can be used to find the optimal word alignment using such one-toone word links including relations between multi-word units. In this paper seven algorithms are ...
متن کاملCategory Enhanced Word Embedding
Distributed word representations have been demonstrated to be effective in capturing semantic and syntactic regularities. Unsupervised representation learning from large unlabeled corpora can learn similar representations for those words that present similar cooccurrence statistics. Besides local occurrence statistics, global topical information is also important knowledge that may help discrim...
متن کاملSemantic Word Embedding (SWE)
......................................................................................................................... 2 Chapter 1 Semantic Word Embedding ........................................................................ 3 1.1 The Skip-gram moel ..................................................................................... 3 1.2 SWE as Constrained Optimization ....................
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Knowledge and Information Systems
سال: 2021
ISSN: ['0219-3116', '0219-1377']
DOI: https://doi.org/10.1007/s10115-020-01531-7